Goto

Collaborating Authors

 interface prediction



End-to-End Learning on 3D Protein Structure for Interface Prediction

Neural Information Processing Systems

Despite an explosion in the number of experimentally determined, atomically detailed structures of biomolecules, many critical tasks in structural biology remain data-limited. Whether performance in such tasks can be improved by using large repositories of tangentially related structural data remains an open question. To address this question, we focused on a central problem in biology: predicting how proteins interact with one another--that is, which surfaces of one protein bind to those of another protein. We built a training dataset, the Database of Interacting Protein Structures (DIPS), that contains biases but is two orders of magnitude larger than those used previously. We found that these biases significantly degrade the performance of existing methods on gold-standard data. Hypothesizing that assumptions baked into the hand-crafted features on which these methods depend were the source of the problem, we developed the first end-to-end learning model for protein interface prediction, the Siamese Atomic Surfacelet Network (SASNet). Using only spatial coordinates and identities of atoms, SASNet outperforms state-of-the-art methods trained on gold-standard structural data, even when trained on only 3% of our new dataset.





Reviews: End-to-End Learning on 3D Protein Structure for Interface Prediction

Neural Information Processing Systems

The authors propose the first end-to-end learning model for protein interface prediction, the Siamese Atomic Surfacelet Network (SASNet). The novelty of the method is that it only uses spatial coordinates and identities of atoms as inputs, instead of relying on hand-crafted features. The authors also introduce the Dataset of Interacting Protein Structures (DIPS) which increases the amount of binary protein interactions by two orders of magnitude over previously used datasets (DB5). The results outperform state-of-the-art methods when trained on the much larger DIPS dataset and are still comparable when trained on the DB5 dataset, showing robustness when trained on bound or unbound proteins. The paper is very well written and easy to follow.


End-to-End Learning on 3D Protein Structure for Interface Prediction

Neural Information Processing Systems

Despite an explosion in the number of experimentally determined, atomically detailed structures of biomolecules, many critical tasks in structural biology remain data-limited. Whether performance in such tasks can be improved by using large repositories of tangentially related structural data remains an open question. To address this question, we focused on a central problem in biology: predicting how proteins interact with one another--that is, which surfaces of one protein bind to those of another protein. We built a training dataset, the Database of Interacting Protein Structures (DIPS), that contains biases but is two orders of magnitude larger than those used previously. We found that these biases significantly degrade the performance of existing methods on gold-standard data.


Revealing data leakage in protein interaction benchmarks

Bushuiev, Anton, Bushuiev, Roman, Sedlar, Jiri, Pluskal, Tomas, Damborsky, Jiri, Mazurenko, Stanislav, Sivic, Josef

arXiv.org Artificial Intelligence

In recent years, there has been remarkable progress in machine learning for protein-protein interactions. However, prior work has predominantly focused on improving learning algorithms, with less attention paid to evaluation strategies and data preparation. Here, we demonstrate that further development of machine learning methods may be hindered by the quality of existing train-test splits. Specifically, we find that commonly used splitting strategies for protein complexes, based on protein sequence or metadata similarity, introduce major data leakage. This may result in overoptimistic evaluation of generalization, as well as unfair benchmarking of the models, biased towards assessing their overfitting capacity rather than practical utility. To overcome the data leakage, we recommend constructing data splits based on 3D structural similarity of protein-protein interfaces and suggest corresponding algorithms. We believe that addressing the data leakage problem is critical for further progress in this research area. The vast majority of protein-protein interactions remain undiscovered.


Deep Learning of High-Order Interactions for Protein Interface Prediction

Liu, Yi, Yuan, Hao, Cai, Lei, Ji, Shuiwang

arXiv.org Machine Learning

Protein interactions are important in a broad range of biological processes. Traditionally, computational methods have been developed to automatically predict protein interface from hand-crafted features. Recent approaches employ deep neural networks and predict the interaction of each amino acid pair independently. However, these methods do not incorporate the important sequential information from amino acid chains and the high-order pairwise interactions. Intuitively, the prediction of an amino acid pair should depend on both their features and the information of other amino acid pairs. In this work, we propose to formulate the protein interface prediction as a 2D dense prediction problem. In addition, we propose a novel deep model to incorporate the sequential information and high-order pairwise interactions to perform interface predictions. We represent proteins as graphs and employ graph neural networks to learn node features. Then we propose the sequential modeling method to incorporate the sequential information and reorder the feature matrix. Next, we incorporate high-order pairwise interactions to generate a 3D tensor containing different pairwise interactions. Finally, we employ convolutional neural networks to perform 2D dense predictions. Experimental results on multiple benchmarks demonstrate that our proposed method can consistently improve the protein interface prediction performance.


End-to-End Learning on 3D Protein Structure for Interface Prediction

Townshend, Raphael, Bedi, Rishi, Suriana, Patricia, Dror, Ron

Neural Information Processing Systems

Despite an explosion in the number of experimentally determined, atomically detailed structures of biomolecules, many critical tasks in structural biology remain data-limited. Whether performance in such tasks can be improved by using large repositories of tangentially related structural data remains an open question. To address this question, we focused on a central problem in biology: predicting how proteins interact with one another--that is, which surfaces of one protein bind to those of another protein. We built a training dataset, the Database of Interacting Protein Structures (DIPS), that contains biases but is two orders of magnitude larger than those used previously. We found that these biases significantly degrade the performance of existing methods on gold-standard data.